Named Entity Recognition in Persian Text using Deep Learning

Authors

Abstract:

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefits from neural network-based approaches for both word representation and entity tagging. In the word representation part of the proposed model, two different vector representations are used and compared: (1) the semantic representation of words based on their context using word2vec continues skip-gram model, and (2) the semantic representation of words based on their context as well as characters forming them using fasttext. While the former model captures the semantic concepts of words, the latter one considers the morphological similarity of words as well. For the entity identification, a deep Bidirectional Long Short Term Memory (BiLSTM) network is used. Using LSTM model helps to consider the history of text when predicting entities, while the BiLSTM model expands this idea by benefiting from the history from both sides of the context. Moreover, inline of the present research, an annotated corpus containing 3000 abstracts (90000 tokens) from the Persian Wikipedia is provided. In contrast to the available datasets in the field, which includes up to 7 label types, the new dataset contains 15 different labels, namely person individual, person group, organizations, locations, religions, books, magazines, movies, languages, nationalities, events, jobs, dates, fields, and other. Developing this dataset will be an important step in promoting future research in this field, especially for the tasks such as question answering that need wider range of entity types. The results of the proposed system show that by using the introduced model and the provided data, the system can achieve 72.92 F-measure.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

PersoNER: Persian Named-Entity Recognition

Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...

full text

Deep Active Learning for Named Entity Recognition

Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show that by combining deep learning with active learn...

full text

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

full text

Named Entity Recognition in Chinese Clinical Text

Dedication Dedicated to my motivation to improve the health outcomes of the population by using the state of the art technologies and my dream to steer the research and development of biomedical informatics as a discipline in China. iii Acknowledgements I would like to thank my advisor, Dr. Hua Xu for guiding and supporting me over the years. You have set an example of excellence as a researche...

full text

Product named entity recognition in Chinese text

There are many expressive and structural differences between product names and general named entities such as person names, location names and organization names. To date, there has been little research on product named entity recognition (NER), which is crucial and valuable for information extraction in the field of market intelligence. This paper focuses on product NER (PRO NER) in Chinese te...

full text

Deep learning with word embeddings improves biomedical named entity recognition

Motivation Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and li...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 16  issue 4

pages  93- 112

publication date 2020-03

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023